iMapReduce: Incremental MapReduce for Mining Evolving Big Data

نویسندگان

Yanfeng Zhang

Shimin Chen

Qiang Wang

Ge Yu

چکیده

As new data and updates are constantly arriving, the results of data mining applications become stale and obsolete over time. Incremental processing is a promising approach to refreshing mining results. It utilizes previously saved states to avoid the expense of re-computation from scratch. In this paper, we propose i2MapReduce, a novel incremental processing extension to MapReduce, the most widely used framework for mining big data. Compared with the state-of-the-art work on Incoop, i2MapReduce (i) performs key-value pair level incremental processing rather than task level re-computation, (ii) supports not only one-step computation but also more sophisticated iterative computation, which is widely used in data mining applications, and (iii) incorporates a set of novel techniques to reduce I/O overhead for accessing preserved fine-grain computation states. We evaluate i2MapReduce using a one-step algorithm and three iterative algorithms with diverse computation characteristics. Experimental results on Amazon EC2 show significant performance improvements of i2MapReduce compared to both plain and iterative MapReduce performing re-computation.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

iMapReduce: Incremental Iterative MapReduce

Cloud intelligence applications often perform iterative computations (e.g., PageRank) on constantly changing data sets (e.g., Web graph). While previous studies extend MapReduce for efficient iterative computations, it is too expensive to perform an entirely new large-scale MapReduce iterative job to timely accommodate new changes to the underlying data sets. In this paper, we propose iMapReduc...

متن کامل

Incremental and Iterative Map reduce For Mining Evolving the Big Data in Banking System

Big data is a broad term for datasets so large and complex that the traditional data processing applications are inadequate, so i2mapreduce based framework for incremental and iterative computations are done in big data. State level processing computation easily retrieve the data and also time consuming. Incremental and iterative mapreducemapreduce is the most widely used big data processing to...

متن کامل

A Survey on Parallel Rough Set Based Knowledge Acquisition Using MapReduce from Big Data

Nowadays, the volume of data is growing at an nprecedented rate, big data mining , and knowledge discovery have become a new challenge in the era of data mining and machine learning. Rough set theory for knowledge acquisition has been successfully applied in data mining. The MapReduce technique, received more attention from scientific community as well as industry for its applicability in big d...

متن کامل

A Novel Mapreduce Lift Association Rule Mining Algorithm (Mrlar) for Big Data

Big Data mining is an analytic process used to discover the hidden knowledge and patterns from a massive, complex, and multi-dimensional dataset. Single-processor's memory and CPU resources are very limited, which makes the algorithm performance ineffective. Recently, there has been renewed interest in using association rule mining (ARM) in Big Data to uncover relationships between what seems t...

متن کامل

Large-scale incremental processing with MapReduce

An important property of today’s big data processing is that the same computation is often repeated on datasets evolving over time, such as web and social network data. While repeating full computation of the entire datasets is feasible with distributed computing frameworks such as Hadoop, it is obviously inefficient and wastes resources. In this paper, we present HadUP (Hadoop with Update Proc...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2015

iMapReduce: Incremental MapReduce for Mining Evolving Big Data

نویسندگان

چکیده

منابع مشابه

iMapReduce: Incremental Iterative MapReduce

Incremental and Iterative Map reduce For Mining Evolving the Big Data in Banking System

A Survey on Parallel Rough Set Based Knowledge Acquisition Using MapReduce from Big Data

A Novel Mapreduce Lift Association Rule Mining Algorithm (Mrlar) for Big Data

Large-scale incremental processing with MapReduce

عنوان ژورنال:

اشتراک گذاری